Input Format

Table 1 is a summary of the input format that consists of a list of <specifier, attribute> pairs. These attributes are the essential tokens and delimiters needed in scanning the input index file. Default string constants are enclosed in double quotes ("...") and character constants are in single quotes ('x'). The user can override the default value by specifying a specifier and a new attribute in the format file or property sheet. The attribute of keyword is self-explanatory; arg_open and arg_close denote the argument opening and closing delimiters, respectively. The meanings of special operators such as level, actual, and encap are described above.

The two range delimiters range_open and range_close are used with the encap operator. When range_open immediately follows encap (i.e., \index{...|(...}), it tells the index processor that an explicit range is starting. Conversely range_close signals the closing of a range. In our design, three or more successive page numbers are abbreviated as a range implicitly. This implicit range formation can be turned off if an indexed term represents logically distinct concepts in different pages. When the implicit range is disabled, explicit page ranges can be enforced by using the two range delimiters range_open and range_close. Therefore, it is possible to index an entire section or a large piece of text related to a certain concept without having to insert an index command in every single page.

The quote operator is used to escape symbols. Thus \index{foo"@goo} means a sort key of foo@goo rather than a sort key of foo" and an actual key of goo. As an exception, quote, when preceded by escape (i.e. \index{...\"...}), does not escape its succeeding letter. This special case is included because \" is the umlaut command in TEX. Requiring quote itself to be quoted in this case (i.e. \"") is feasible but somewhat awkward; quote and escape must be distinct.

A page number can be a composite of one or more fields separated by the delimiter bound to page_compositor (e.g., II-12 for page 12 of Chapter II). This attribute allows the lexical analyzer to separate these fields, simplifying the sorting of page numbers.